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CLAIMS: 



15 



20 
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1 . A method of computer data analysis using neurai networlo, the 
method including: 

generating a data representation using a data set, the data set including 
a plurality of attributes, wherein generating the data representation includes: 

modifying the data set using a training algorithm, wherein the 
training algorithm includes growing the data set; and 

performing convergence testing, wherein convergence testing 
checks for convergence of the training algorithm, and wherein the 
modifying of the data set is repeated until convergence of the training 
algorithm occurs; and 

displaying one or more sut>sets of the data set using the data 
representation. 

2. A method according to claim 1 , further including generating the 
data set using input data, and wherein generating the data set includes 
fonmatting the input data and initializing the fonnatted input data. 

3. A method according to claim 2, wherein formatting the input data 
further includes creating a container class including a list of data vectors, D , 
where d, is the i th vector in D, and d, ^ Is the jth element of vector i. 

4. A method according to claim 2, wherein formatting the input data 
further includes data scaling and binarisation of at least a portion of the data 
set. 

5. A method according to claim 4, wherein data scaling includes 
replacing each element in each data vector in the data set by a scaled 
representation of itself, where: 

Vi6 [Ucardid)} , Vd, e D 



6. A method according to claim 4 or 5, wherein binarisation includes 
converting attributes into one or more toggled attribute values. 

7. A method according to any one of the preceding claims, wherein 
perfomning convergence testing includes testing condition ^(0 < s2e • 
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8. A method according to claim any one of claims 2 to 7, wherein 
initializing the fonnatted input data includes: 

calculating an autocorrelation matrix, K over the input data set D. 

whereK = — ^S.dd'-; 

5 finding two longest eigenvectors of , *» and *2 , where l®>l ^ 1*^1 ; and 

initializing vector values of each element of the data set F by spanning it 
with element values of the eigenvectors. 

9. A method according to claim 8, wherein initializing the vector 
values includes: 

10 F,^,.o>-0 

^<Fg,Fc> ®I 



15 Vce[2.Fc-l], -F<i^^:=-|-F,p,.p,>+%-^i^<^.4> 



F.-r 



Vr e [2, Fj, - 1] , F^j, ?= — F^,,,^ + — F^ ^. 

Fr fR 

r F —r 

Vr e [2, Fj, - 1] , F^ p^ ?= - F^^j,^^ + — ^<r.F,> 

Fr Fr 

Vre [2.F, -1], Vce [2,Fc -1], F^^ ?=-^J^<r.Fc> +-^^^<r.i>. 

Fc 

10. A method according to any one of the preceding claims, wherein 
20 the data set includes a plurality of data set nodes, and wherein growing the data 
set Includes: 

finding for each of the data set nodes, where is the node 
with the highest average quantization error, arg max '^■(Ojc, jfor 
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each of the data set nodes, where g(Oic, =777 S'^O*. *>e average 
quantization enror for node q , where: 

K- - argmax{|iC, - ^<r(,)*(«)-i>||»|^« ~ ^<r(»M«)+i>|} 

y 

» r(y) If r(y) < r(c) , eise n, = r(c) ; and 
n.=c(y); 

eise = r(y) ; = c(x) if c(jc) < c(c) , eise = c(c) ; 
inserting a new row and column after row and column ; and 
10 Interpolating new attribute values for the newly inserted node 

vectors using: K^^^ = (if<,^.,> + K„^,*i>)^ and 

K^,^ = (k^,.i^ + , where a e UiOX) . 

11. A method according to any one of the preceding claims, wherein 
the training algorithm further Includes: 

15 t = t + V, 

Vd€ D; 

if {t < 50 or afterGrow) 

pj= argmin ||d-F^.^|| 

afterGrow - false 

20 else 

= FindSCWSid) 

call function: FindNeigKbofhoodPattemsCW) 
call function: BatchUpdateMatchVectors 

25 if (MayGraw(0 and call function: GrowKF . 

12. A method according to any one of the preceding claims, wherein 
displaying one or more subsets of the data set Includes using a composite view 
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to view multiple attributes, wiierein an additional attribute image is created, the 
additional attribute image displaying a union of a selected set of attributes. 

13. A method according to claim 1 2, wherein using a composite view 
further includes: 

5 constructing an attribute matrix; and 

selecting a highest value for each attribute value from the selected set of 
attributes. 

14. A method according to any one of the preceding claims, wherein 
displaying one or more subsets of the data set Includes uses a range filter to 

10 select regions on the data representation and filter out nodes based on defined 
value ranges . 

15. A method according to any one of the preceding claims, wherein 
displaying one or more subsets of the data set includes using a zooming 
function, wherein the zooming function includes: 

1 5 mal<ing a selection of nodes to fonn a base reference of interest; 

defining a set of data records from a second data set; 
matching the second data set to the data representation; 
flagging all records that are linked to the matched region; and 
generating a second data representation using the flagged records. 
20 1 6. A method according to any one of the preceding claims, wherein 

displaying one or more subsets of the data set includes using visual scaling, 
changing the minimum and maximum values used to calculate a colour 
progression used to visualize at least one of the plurality of attributes, and re- 
interpolating the active colour ranges over the new valid range of attribute 
25 values. 

1 7. A method according to any one of the preceding claims, wherein 
displaying one or more subsets of the data set includes using a labeling engine 
to: 

Uniting attribute columns In an input file to attributes in the data 
30 representation; 

selecting attributes from the input file to be used for labelling; 
determining with which row and column each row in the input file is 
associated; and 

placing labels on the data representation. 
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18. A method according to any one of the preceding claims, wherein 
displaying one or more subsets of the data set includes using an advanced 
search function to: 

read a set of data records from a data source; 
5 match attribute columns from the set of data records to attributes In the 

data representation; and 

display a list of all records that are associated with nodes that are part of 
the active selection on the data representation. 

19. A method according to any one of the preceding claims, wherein 
10 displaying one or more subsets of the data set includes using equal distance 

averaging (EDA), wherein equal distance averaging includes: 

finding the node vector, n, In the data representation that most closely 
represents the input data vector, d : 
n = arginin|ii:<,^ -dl ], Vre [1. JST^,], Vc€ {X,Kc] \ and 

1 5 replacing missing entries In d with the corresponding entries from n . 

20. A method according to claim 19, wherein the equal distance 
averaging further includes: 

building a list of the data representation nodes values, M , such that for 
each element mof M, ||ni-d||^ =0; 

20 If Mis empty, then replace each missing entry in d with con-esponding 

entries in n; and 

If Mis not empty, then replace each missing entry in dwith the average 
value of the conresponding position of all the elements in M . 

21 . A method according to any one of the preceding claims, wherein 
25 the data representation Includes a knowledge filter. 

22. A method of computer data analysis using neural networks, the 
method including: 

generating a data set D , the data set including a plurality of attributes 
and a plurality of data set nodes; 
30 initializing the data set, initializing the data set including: 

calculating an autocorrelation matrix, K over the input data set D , 

whereK = — S* *'' ; 
card(J})^ 
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flndlng two longest eigenvectors of « , ••and «», where l^'I^KI; 

and 

initializing vector values of each element of a data representation 
F by spanning it with element values of the eigenvectors; 
5 generating a data representation using a training algorithm, wherein the 

training algorithm Includes growing the data set, growing the data set including: 
finding for each of the data set nodes, where Is the node . 

with the highest average quantization error, arg max ^O) j for 

each of the data set nodes, whore qiO^^ "^'S'^COjc, ^ *he average 
1 0 quantization enor for node q , where: 

n, - riy) If r(y) < r(c) , else n, = r(c) ; and 
15 n,=c{y)\ 

else n, = r(y) \ = c(je) If c(jc) < c(c) . else = c(c) ; 
inserting a new row and column after row and column n^; 
Interpolate new attribute values for the newly inserted node 

vectors using: K^^^ = (k^,^.^ + K^^^Jj^ and 

20 K^^ = (k^.^ + K^^,,^ )| , where a e C/(0.1) ; 

performing convergence testing, wherein convergence testing checks for 
convergence of the training algorithm, and wherein the training algorithm is 
repeated until convergence of the training algorithm occurs; and 

displaying one or more subsets of the data set using the data 
25 representation. 

23. A method according to claim 22, wherein initializing the vector 
values further includes: 
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^<I.Fc> ~ *i + *2 

Vc6 [2,Fc -1], F^^ :=-=^F^.p^, 

5 Vce [2,Fc -1], F^^ ^;^^<''«.'c> '*"^^^^<^-»> 

Vr€[2,F,-l], Fcr,^?=-^F^,,.,,+^^^F^.„ 
Vr€[2,F,-l], F^.^^,:=-pF,p,.,^,+-^^F^.,,^ 

R R 

Vre[2.F,-l], Vce[2,Fc-l], F^^ :=-|-Fc.p^> +^^^F^,.,>. 

24. A method according to claim 22 or 23, wlierein thie training 
1 0 algoritiim f urtiier includes: 

t = t+l; 
Vd€ D 

If < 50or qfterGrow) 

argmin J|d-^« 

1 5 afterGrow = /a/se 



20 



<r,c> 



P 



else 



^4 = FindSCWSid) 

call function: F»MflVeigW»or/uwd!P<tff«myCp) 
call function: BatchUpdateMatchVectors 



If (MflyGrowCO and r < , call function: GrowKF . 

25. A method according to claim 22, 23, or 24, wherein performing 
convergence testing Includes testing condition q{f) < Q, . 

26. A method according to any one of claims 22 to 25, wherein 

25 displaying one or more subsets of the data set includes using a composite view 
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to view multiple attributes, wherein an additional attribute image is created, the 
additional attribute image displaying a union of a selected set of attributes. 

27. A method according to daim 26, wherein using a composite view 
further includes: 

5 constructing an attribute matrix; and 

selecting a highest value for each attribute value from the selected set of 
attributes. 

28. A method according to any one of claims 22 to 27, wherein 
displaying one or more subsets of the data set includes uses a range filter to 

1 0 select regions on the data representation and filter out nodes based on defined 
value ranges . 

29. A method according to any one of claims 22 to 28, wherein 
displaying one or more subsets of the data set includes using a zooming 
function, wherein the zooming function includes: 

1 5 making a selection of nodes to fonm a base reference of Interest; 

defining a set of data records from a second data set; 
matching the second data set to the data representation; 
flagging ail records that are linked to the matched region; and 
generating a second data representation using the flagged records. 
20 30. A method according to any one of claims 22 to 29, wherein 

displaying one or more subsets of the data set includes using visual scaling, 
changing the minimum and maximum values used to calculate a colour 
progression used to visualize at least one of the plurality of attributes, and re- 
interpolating the active colour ranges over the new valid range of attribute 
25 values. 

31 . A method according to any one of claims 22 to 30, wherein 
displaying one or more subsets of the data set Includes using a labeling engine 
to: 

linking attribute columns in an input file to attributes In the data 
30 representation; 

selecting attributes from the input file to be used for labelling; 
determining with which row and column each row in the input file is 
associated; and 

placing labels on the data representation. 
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32. A method according to any one of claims 22 to 31 , wlierein 
displaying one or more subsets of the data set includes using an advanced 
search function to: 

read a set of data records from a data source; 
5 match attribute columns from the set of data records to attributes in the 

data representation; and 

display a list of all records that are associated with nodes that are part of 
the active selection on the data representation. 

33. A method according to any one of claims 22 to 32, wherein 
10 displaying one or more subsets of the data set Includes using equal distance 

averaging (EDA), wherein equal distance averaging includes: 

finding the node vector, n, in the data representation that most closely 
represents the input data vector, d : 
n = aig imn|j!:^^ - dl ), Vr e [1, K^h^ce [1, J^cl ; and 

1 5 replacing missing entries in d with the con-esponding entries from n . 

34. A method according to claim 33, wherein the equal distance 
averaging further includes: 

building a list of the data representation nodes values, M, such that for 
each element mof M, |in-d||^ =0; 

20 if M is empty, then replace each missing entry in d with corresponding 

entries in n; and 

If Mis not empty, then replace each missing entry in dwith the average 
value of the corresponding position of all the elements in M . 

35. A method according to any one of claims 22 to 34, wherein the 
25 data representation is a knowledge filter. 

36. A method according to any one of the preceding claims wherein 
the data representation includes a latent model of the data set. 

37. A system for perfonming data analysis using neural networks, the 
system including: 

30 one or more processors; 

one or more memories coupled to the one or more processors; and 
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program instructions stored in the one or more memories, the one or 
more processors being operable to execute the program instructions, the 
program instructions including: 

generating a data representation using a data set, the data set including 
5 a plurality of attributes, wherein generating the data representation includes: 
modifying the data set using a training algorithm, wherein the 
training algorithm includes growing the data set; and 

perfomnlng convergence testing, wherein convergence testing 
checks for convergence of the training algorithm, and wherein the 
1 0 modifying of the data set is repeated until convergence of the training 

algorithm occurs; and 

displaying one or more subsets of the data set using the data 
representation. 

38. A system according to claim 37, wherein perfomnlng convergence 
1 5 testing includes testing condition ^(0 <Q,. 

39. A system according to claim 37 or 38, wherein the data set 
includes a plurality of data set nodes, and wherein growing the data set 
includes: 

finding for each of the data set nodes, where is the node 
20 with the highest average quantization enror, arg max ^iOg^ j for 

each of the data set nodes, where ^(0,^, = — rE^W*, average 

quantization error for node q , where: 

= aign^{||i:, - ^<r(,)-u(,)>|l^<, - ^<r(,)*U(,)>||} 

K, =argmax{||A:, -^<k,m«h>|1»1^« ~ '^<r{«M5)+i>|j 
25 \llK,-K,l<\\K,-K,\\\hen 

»r = Ky) if riy) < r(c) , else = r(c) ; and 

else = r(y) ; = c(x) if c(x) < c(c) , else = c(c) ; 
inserting a new row and column after row and column n^; and 
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interpolating new attribute values for the newly Inserted node 
vectors using: K^^^ = {k^^-i> + i«^<r^*i> )| and 

K^,^ = (isr<„,..^ +i«:<,,«^)|. Where ae t/(0.l). 

40. A system according to claim 37, 38, or 39, wherein the training 
5 algorithm further includes: 

t = t-¥\\ 

Vd€D; 

if ( r < 50 or t^terGrow) 

iPt= argmin J|d-i^<,^L 

1 0 afterGrtm — false 

else 

ip^ = FindSCWSid) 

call function: FindNeighborJioodPattemsCp) 
caWUinc^n: BcadiUpdcaeMatchVectors 

If {MayGrow{t) and t<t^), call function: Grow^TF . 

41 . A system according to any one of claims 37 to 40, wherein the 
program Instructions further Include: displaying one or more subsets of the data 
set Includes using a composite view to view multiple attributes, wherein an 

20 additional attribute Image Is created, the additional attribute Image displaying a 
union of a selected set of attributes. 

42. A system according to any one of claims 37 to 41 , wherein the 
program instructions further Include: 

constructing an attribute matrix; and 
25 selecting a highest value for each attribute value from the selected set of 

attributes. 

43. A system according to any one of claims 37 to 42, wherein 
displaying one or more subsets of the data set includes uses a range filter to 
select regions on the data representation and filter out nodes based on defined 

30 value ranges . 
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44. A system according to any one of claims 37 to 43, wherein 
displaying one or more subsets of the data set includes using a zooming 
function, wherein the wherein the program instructions further Include: 

making a selection of nodes to fomi a base reference of interest; 
5 defining a set of data records from a second data set; 

matching the second data set to the data representation; 
flagging all records that are United to the matched region; and 
generating a second data representation using the flagged records. 

45. A system according to any one of claims 37 to 44, wherein 

1 0 displaying one or more subsets of the data set includes using visual scaling, 
wherein the wherein the program instructions further Include: 

changing the minimum and maximum values used to calculate a colour 
progression used to visualize at least one of the plurality of attributes; 

and re-interpolating the active colour ranges over the new valid range of 
15 attribute values. 

46. A system according to any one of claims 37 to 45, wherein 
displaying one or more subsets of the data set includes using a labeling engine, 
wherein the program instaictions further include: 

linking attribute columns In an input file to attributes In the data 
20 representation; 

selecting attributes from the Input file to be used for labelling; 
determining with which row and column each row In the input file is 
associated; and 

placing labels on the data representation. 
25 47. A system according to any one of claims 37 to 46, wherein 

displaying one or more subsets of the data set includes using an advanced 
search engine, wherein the program instructions further include: 

displaying one or more subsets of the data set includes using an 
advanced search function to: 
30 read a set of data records from a data source; 

match attribute columns from the set of data records to attributes in the 
data representation; and 

display a list of all records that are associated with nodes that are part of 
the active selection on the data representation. 
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48. A system according to any one of claims 37 to 47, wlierein 
displaying one or more subsets of the data set includes using equal distance 
averaging (EDA), wherein the program instructions further Include: 

finding the node vector, n, in the data representation that most closely 
5 represents the input data vector, d : 

n = arg tpxnh^^ - di J, Vr € [1, JC« ] , Vc e [1, ] ; and 

replacing missing entries in dwith the corresponding entries from n. 

49. A system according to claim 48, wherein the program instructions 
f urther Include: 

10 building a list of the data representation nodes values, M , such that for 

each element m of M , |{m - = 0 ; 

If Mis empty, then replace each missing entry in d with con^esponding 
entries in n; and 

If Mis not empty, then replace each missing entry in dwith the average 
15 value of the corresponding position of all the elements in M . 

50. A system according to any one of claims 37 to 49, wherein the 
data representation includes a l<nowledge filter. 

51. A system according to any one of claims 37 to 50, wherein the 
data representation includes a latent model of the data set. 

20 52. A computer program product for computer data analysis using 

neural networks, the computer program product Including: 

computer-readable program code for generating a data representation 
using a data set, the data set including a plurality of attributes, wherein 
generating the data representation includes: 
25 modifying the data set using a training algorithm, wherein the 

training algorithm includes growing the data set; and 

performing convergence testing, wherein convergence testing 
checl<s for convergence of the training algorithm, and wherein the 
modifying of the data set is repeated until convergence of the training 
30 algorithm occurs; and 

computer-readable program code for displaying one or more subsets of 
the data set using the data representation. 
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53. A computer program product according to claim 52, wherein the 
data set includes a plurality of data set nodes, and the computer program 
product further including computer-readable program code for growing the data 
set including: 

finding for each of the data set nodes, where is the node with the 
highest average quantization error, arg max i tor each of 



4 



1 #-/-! 

the data set nodes, where giOx^- — rS^^')*, '® ^® average 
quantization error for node q , where: 

K, = argmax{|i:, -^^<r(,)-i-<,)>||*^, -^<r(,)*i^,)>|} 
10 K, =ai:gmax{|]A:, -■«:<k,m,k>|K --^<^w^*h«>|> 

if ||i5:,-i5:.||<|A:,-A:,||then 

», = r(y) if r(y) < r(c) , else n, = r(c) ; and 

else = r(y) ; = c(x) if dx) < c{c) , else = c(c) ; 
1 5 inserting a new row and column after row and column 71^ ; and 

Interpolating new attribute values for the newly inserted node 

vectors using: if <,.^> = 

54. A computer program product according to claim 52 or 53, wherein 
20 the data representation includes a latent model of the data set. 

55. An apparatus for performing data analysis using neural networks, 
the apparatus Including: 

means for representing a data set, the data set including a plurality of 
attributes; 

25 means for generating the representation means using the data set, 

wherein generating the representation means includes: 

modifying the data set using a training algorithm, wherein the 
training algorithm includes growing the data set; and 
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performing convergence testing, wherein convergence testing 
checi<s for convergence of the training aigorithm, and wherein the 
modifying of the data set is repeated until convergence of the training 
algorithm occurs; and 
5 means for displaying one or more subsets of the data set using the 

modified data representation. 

56. An apparatus according to claim 55, wherein the representation 
means includes a latent model of the data set. 

57. A method of computer data analysis using neural networlcs 
10 substantially as herein described with reference to the accompanying drawings. 

58. A system for performing data analysis using neural networks 
substantially as herein described with reference to the accompanying drawings. 

59. A computer program product for computer data analysis using 
neural networks substantially as herein described with reference to the 

15 accompanying drawings. 

60. An apparatus for perfomning data analysis using neural networks 
substantially as herein described with reference to the accompanying drawings. 



